A Log-Linear Block Transliteration Model based on Bi-Stream HMMs
نویسندگان
چکیده
We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letteralignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration approach is obtained. Furthermore, by incorporating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real translation scenario from Arabic to English.
منابع مشابه
English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کاملHindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009
This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...
متن کاملIntegrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System
The system presented in this paper is based upon a phrase-based statistical machine transliteration (SMT) framework. The SMT system’s log-linear model is augmented with a set of features specifically suited to the task of transliteration. In particular our model utilizes a feature based on a joint source-channel model, and a feature based on a maximum entropy model that predicts target grapheme...
متن کاملIdentification of Geochemical Anomalies Using Fractal and LOLIMOT Neuro-Fuzzy modeling in Mial Area, Central Iran
The Urumieh-Dokhtar Magmatic Arc (UDMA) is recognized as an important porphyry, disseminated, vein-type and polymetallic mineralization arc. The aim of this study is to identify and subsequently determine geochemical anomalies for exploration of Pb, Zn and Cu mineralization in Mial district situated in UDMA. Factor analysis, Concentration-Number (C-N) fractal model and Local Linear Model Tree (...
متن کاملIntegration of multiple feature sets for reducing ambiguity in automatic speech recognition
This thesis presents a method to investigate the extent to which articulatory based acoustic features can be exploited to reduce ambiguity in automatic speech recognition search. The method proposed is based on a lattice re-scoring paradigm implemented to integrate articulatory based features into automatic speech recognition systems. Time delay neural networks are trained as feature detectors ...
متن کامل